Online-Academy
Look, Read, Understand, Apply

Data Mining And Data Warehousing

Search Engine

Search Engine is a software which helps user to search data, information of their need. Search engine takes user queries as input, searches pages relevant with the user query in the World Wide Web (WWW) and returns all the relevant pages to the user. User can search queries, documents, videos, images, webpages etc. on a search engine.

Working of a Search engine
Search engine takes user query, the software modules called web crawlers or Spider scan or crawl through all the web pages and collects only the relevant pages. Then those relevant pages are organized in database for ease searching. Then search engine returns most relevant pages to the user. This process happens very fast in the background.

Architecture of Search Engine
Search engine has three main components.

  • Web Crawlers: It is a software component which crawls all the webpages to find required information.
  • Database: Information collected by Web Crawlers are stored in the database for easy access
  • Search Interface: Search Interface is a medium which is used by users to post their queries to the search engine. It is a user Interface containing form where user can type their queries.

How pages are ranked by search engine?
Different search engines use different algorithm to rank pages. Search Results produced by one search engine will be different from another as they are not using same algorithm to rank pages. But there are certain factors used by all of the search engines to rank pages:

  • On Page Factors:
  • Off Page Factors:
On Page Factors:
  • Content of Page
  • Title, Meta Tags
  • URL Structure
  • Number of keywords used in page
  • XML Sitemap
  • Heading tags
Off Page Factors:
  • Quality of links (no broken link)
  • Blogs, Comments, Article Directories
  • Link Exchange
  • Social Networking

Damping Factor
A random web surfer randomly clicking on links will eventually stop clicking. Probability of random surfer following links in pages is a damping factor (d). The probability that random surfer jump to any random web page is 1 - d.